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THE CORRELATION OF ECONOMIC STATISTICS. 

By Warren M. Persons, Assistant Professor of Economics, Dartmouth 

College. 



Economics deals mainly with correlation rather than with simple causa- 
tion {p. 287); necessity of a method of measuring the correspondence between 
two series of statistics (p. 288) ; illustrations of the use of the graphic method 
(p. 290) ; the coefficient of correlation defined and illustrated (p. 298) ; the coeffi- 
cient of correlation as a measure of the grouping of points about the line of 
regression (p. SOS) ; the equation of the line of regression (p. SOS) ; the coeffi- 
cients of correlation computed for the illustrations cited above (p. 806); two 
influences affect the size of the coefficient of correlation, i. e., short-time fluctua- 
tions and secular tendency (p. S06) ; three methods of isolating the two influ- 
ences are described and illustrated (p. SIO) ; the measurement of the correlation 
among three variables (p. SI 9); conclusion (p. 322). 

The cause and effect relation existing between economic 
events is especially difficult to ascertain because of the pres- 
ence of innumerable variable elements. In solving his prob- 
lems the economist can not, like the physicist or chemist, 
eliminate all causes except one and then by experiment 
determine the effect of that one. Causes must be dealt with 
en masse. Since any effect is the result of many combined 
causes the economist is never sure that a given effect will 
follow a given cause. In stating an economic law he always 
has to postulate "other things remaining the same," with, 
perhaps, little appreciation of what the other things may be. 
It is rarely, if ever, possible for the economist to state more 
than "such and such a cause tends to produce such and such 
an effect." Events can only be stated to be more or less 
probable. He is dealing mainly, therefore, with correlation 
and not with simple causation. 
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The problems of economics are similar to certain problems 
of biology, such as the effect of environment and heredity 
upon the individual. In dealing with the question of heredity 
Karl Pearson says:* "Taking our stand then on the observed 
fact that a knowledge neither of parents nor of the whole 
ancestry will enable us to predict with certainty in a variety of 
important cases the character of the individual offspring we 
ask: What is the correct method of dealing with the problem 
of heredity in such cases? The causes A, B, C, D, E . . . 
which we have as yet succeeded in isolating and defining 
are not always followed by the effect X, but by any one of 
the effects U, V, W, X, Y, Z. We are therefore not dealing 
with causation but correlation, and there is therefore only one 
method of procedure possible; we must collect statistics of the 
frequency with which TJ, V, W, X, Y, Z, respectively, follow 
on A, B, C, D, E . . . From these statistics we know the 
most probable result of the causes A, B, C, D, E and the 
frequency of each deviation from this most probable result. 
The recognition that in the existing state of our knowledge the 
true method of approaching the problem of heredity is from 
the statistical side, and that the most that we can hope at 
present to do is to give the probable character of the offspring 
of a given ancestry, is one of the great services of Francis 
Galton to biometry." 

Just as the biologists cannot predict a man's height or color 
of eyes or temper or combativeness by knowing those quali- 
ties in his ancestors, so economists cannot predict that a defin- 
ite call rate in Wall Street will go with a given percentage of 
reserves to deposits in New York banks or that a given supply 
of wheat will result in a definite price per bushel. But, on the 
other hand, just as it has been observed that there is a rela- 
tion existing between a man's stature and the stature of his 
ancestors, so it has been observed that a relation does exist 
between bank reserves and call rates and between supply of 
wheat and its price per bushel. 

In order to deal in a satisfactory way with such questions 
as those given above it is necessary to accumulate statistics of 
the supposedly related phenomena. In order to have those 

*The Law of Ancestral Heredity, Biometrika, Vol. II, p. 215. 



3] The Correlation of Economic Statistics. 289 

statistics indicate anything it is necessary to obtain a method 
of measuring the extent of correlation between the phenomena. 
The commonly used method of measuring the amount of 
correlation between any two series of economic statistics is to 
represent the two series graphically upon the same sheet of 
cross-section paper and then compare the fluctuations of one 
series with those of the other. The quantity theory of prices 
has been tested in this way by Dr. E. W. Kemmerer.* Dr. 
Kemmerer builds up the following price equation: 

MR+CR, t 

r " NE+N C E C 
in which: 

P s =the average price (weighted by the total 
flows) of all commodities sold for money 
and deposit currency during a unit of time. 

M = the total currency in circulation during the 
unit of time. 

R = the average number of times each unit of 
currency changes hands during the unit 
of time. 

NE = the flow of goods exchanged for currency. 

C = the volume of deposit currency exchanged 

for goods. flow of 

R c = the average rate of turnover of such deposit [ -^ ^ ~ 
currency. 

N c E c =the flow of goods exchanged for deposit cur- 
rency. 



the 

flow 

of 

cur- 
rency 



rency. 



Dr. Kemmerer then attempts to find the answer that facts 
give to the following questions: 

1. Do the bank reserves vary directly with the money 
supply? 

2. Does the proportion of bank reserves to check circula- 
tion vary directly with the degree of business distrust existing 
in the country? 

*Money and Credit Instruments in their Relation to General Prices, Henry Holt and Co., 
1907. 

tSee Quarterly Journal of Economics, Feb., 1908, p. 274, for derivation of equation. In the 
current article, I quote from the review of Doctor Kemmerer's book. 
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3. Is "a relative increase in the circulating media accom- 
panied by a corresponding and proportionate increase in gen- 
eral prices and a relative decrease in the circulating media, by 
a corresponding and proportionate decrease in general prices," 
or, in the language of the formula, is 

_ MR+CR C * 
*' NE+N C E C 

borne out by the facts? 

All of the questions to be tested by the statistics collected 
are questions of correlation. Dr. Kemmerer makes the tests 
graphically, as has been stated, by comparing the fluctuations 
of the two curves based upon the pair of series of statistics 
being considered. The charts presented by Dr. Kemmerer 
from which his conclusions are drawn are given below. 

In the case of the correlation of bank reserves and money 
in circulation, inclusive of bank reserves, Dr. Kemmerer con- 
cludes, "There can be no question but that when due allow- 
ance is made for fluctuations in business confidence, the evi- 
dence of Chart I strongly supports the contention that there 
exists a close relationship between the amount of money in 
circulation and the amount of the country's bank reserves." f 
In the case of the correlation of business distrust and the ratio 
of bank reserves to check circulation the conclusion is, "the 
chart substantiates the contention . . . that the ratio 
of check circulation to bank reserves is a function of business 

*Money and Prices, p. 139. 

Dr. Kemmerer uses statistics of the United States for the period 1879-1904 to make 
his inductive tests. The statistics of total bank reserves he obtains from the Report of 
the Comptroller of the Currency. For the amount of money in each year (M) he takes 
the average of the total money in circulation at the beginning and end of that fiscal year 
as given in the Statistical Abstracts. The check circulation for each year he obtains by 
multiplying the total bank clearings in each year by ^. This ratio is the ratio between 
the estimated total check circulation for 1896 and the bank clearings for that year. The 
rapidity of circulation of 47 per year he derives by dividing the estimated total money 
transactions in 1896 by the money circulation of that year. The figures for the growth 
of business he finds by taking the simple average of index numbers of fifteen different 
series of statistics taken as representing the industrial activity of the year considered. 
The index numbers of business distrust are the simple averages of the corresponding 
indices for the proportion of concerns failing, and the average liabilities of concerns fail- 
ing. " The general index figures of prices and wages were computed by combining in a 
weighted average the index figures for the prices of railroad securities (Commons), the 
index figures for the prices of wholesale commodities (Commons), and the index figures 
for wages (Department of Labor tables for twenty-five occupations).*' 

\lbid., p. 143. 
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confidence . . ."* The final test of the quantity theory- 
is the amount of correlation between the figures for the right 

and left-hand sides of the equation P s = c . Upon 

examination of the curves plotted from the two series of sta- 
tistics representing general prices and relative circulation (the 
left and right-hand sides, respectively, of the price equation) 
Dr. Kemmerer concludes, "The general movement of the two 
curves taken as a whole is the same, while the individual 
variations from year to year exhibit a striking similarity."t 

The graphic method of comparing fluctuations is well 
enough as a preliminary, but does it enable anyone to tell any- 
thing of the extent of the correlation between the series of figures be- 
ing considered? Is Dr. Kemmerer warranted in deducing his 
conclusions from observation of the charts? It seems to the 
writer that one opposing the quantity theory might draw 
opposite conclusions with as much (or as little) reason. The 
charts do not answer the questions proposed. The painstaking 
collection of statistics to test correlation is useless if there be 
no more reliable method to measure correlation. A numerical 
measure of the correlation must be found if we wish to de- 
termine the extent to which the fluctuations of one series 
synchronize with the fluctuations of another series. 

A second illustration of a conclusion based upon graphic 
representation is that of Ira Cross in his study of strike sta- 
tistics. J He says, upon consideration of data taken from 
the Twenty-first Annual Report of the United States Bureau 
of Labor, "the percentage of successful strikes decreases dur- 
ing periods of business prosperity and increases during 'hard 
times.' " In the accompanying charts the per cent, of estab- 
lishments in which strikes were successful is plotted, first, 
with the per capita exports and imports and second, with index 
numbers of wholesale prices. The foreign trade and the 
price statistics are taken as indicative of the activity of busi- 
ness, as indices of prosperity. 

*Ibid., p. 146. 
Mbid., p. 147. 
^Quarterly Publications of the American Statistical Association, June, 1908, p. 168. 



9] 



The Correlation of Economic Statistics. 



295 



Per capita 
imports plus 
exports. 




50 1881 1865 



WJO 1895 1900 1*>5 3 ° 
PER CS8I OF ESTABLISHMENTS IN HHICB STB1KES HEBE SUCCESSm. 



. — i .. per CAPITA 1VP0RT8 PLUS EXPORTS. 



296 



American Statistical Association. 



Lio 




I865 1890 1895 
• Index Numbers of Prices. 

„, Per Cent of Sstablisaaenta ia mhioh 
Strikes were Successful. 



11] 



The Correlation of Economic Statistics. 



297 



A third illustration of a conclusion relating to correlation 
is taken from the London Statist of April 4, 1908, where the 
proposition is made that, "When commodities advance prices 
of Stock Exchange securities recede; when commodities recede 
Stock Exchange securities advance." The proposition is sup- 
ported by reference to the following chart showing the yearly- 
average price of consols and Sauerbeck's index numbers of 
prices. 




18J1 



1860 1870 1880 

Yearly Average Price of Console. 



1890 



1900 



1908 



.. — -. Sauerbeok'a Index Numbers. 
CJ25L=JS9SL) 
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The foregoing illustrations show the need by economists of 
a quantitative measure of correlation. Such a measure has 
been widely used in biological statistics and used to a limited 
extent in economic statistics.* G. U. Yule has used the 
measure in his study of "Pauperism f\ IL H. Hooker has used 
it in his "Correlation of the Weather and Crops ;"t J. P. 
Norton applied it in his study of the "New York Money 
Market." This measure, the coefficient of correlation, will be 
computed for the data upon which the conclusions quoted 
above are based. The formula for the coefficient of correlation 
is 

nsW 
where: 

x = deviation from arithmetic mean = X — Mi 
y = deviation from arithmetic mean = Y— M 2 
<i\= standard deviation of X series 
<r 2 = standard deviation of Y series 
n= number of items. 

The coefficient of correlation "serves as a measure of any 
statement involving two qualifying adjectives, which can be 
measured numerically, such as 'tall men have tall sons,' 
'wet springs bring dry summers,' 'short hours go with high 
wages.' " § It is not the purpose in what follows to go through 
the mathematical derivation of the coefficient of correlation, 
but to test the formula empirically in order to ascertain how it 
actually varies for given series of statistics and to point out 
some of its features. 

However, it should be noted at this point that the coeffi- 
cient of correlation is not empirical but was derived by a 
priori reasoning. It was found by assuming that a large 
number of independent causes operate upon each of the two 

*See the very interesting paper by G. U. Yule on the Applications of the Method of 
Correlation to Social and Economic Statistics. (Journal of the Royal Statistical Society, 
December, 1909.) This paper gives a history of the application of the method of correlation 
with a bibliography of works in which it has been applied to economic and social statistics. 

tJournal of the Royal Statistical Society, 1899, Vol. 62, pp. 249-286. 

{Journal of the Royal Statistical Society, March, 1907, p. 1. 

§Bowley, A. L., Elements of Statistics, p. 320. 
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series X and Y, producing normal distributions in both cases. 
Upon the assumption that the set of causes operating upon the 
series X is not independent of the set of causes operating upon 

Sxy 



the series Y the value r = 



no-^ 



is obtained. This value 



becomes zero when the operating causes are absolutely inde- 
pendent. Hence the value of r was taken as a measure of 
correlation.* In what follows no assumption concerning 
the type of distribution of the X and Y series will be made. 

Some appreciation of the meaning of the coefficient of 
correlation can be obtained by the consideration of a few 
simple applications. Suppose that we consider the two series 
Of measurements: 



X = l, 2, 3, 4, 5 
Y=6, 8, 10, 12, 14 



Mi- 3 
M 2 = 10 



Deviations. 


Square of Deviation. 


Product of Deviations. 




X 


y 


x2 


y 2 


xy 




-2 

-1 



+1 
+2 


-4 
-2 

+2 
+4 


4 
1 

1 
4 


16 
4 

4 

16 


8 
2 

2 
8 


^=2^2 
20 


1 5^2*2^2 



In the above illustration the numbers were chosen so that 
for an increase of 1 unit in the X series there is an increase of 
2 units in the Y series. Thus the correlation is perfect and 
r equals +1. If the Y series had been 14, 12, 10, 8, 6 (the 
X series remaining the same) the value of r would have been 
— 1. Thus —1 stands for perfect negative correlation, an 
increase in one series corresponding to a decrease in the 
other. It should also be noted in this connection that the 
coefficient of correlation (r) cannot be less than — 1 nor more 
than +l.f 

*See Yule, Journal of the Royal Statistical Society, Vol. 60, p. 839; Bowley, Elements 
of Statistics, p. 316; Elderton, Frequency Curves and Correlation, p. 106. 

tProof of this statement may be conveniently found in Bowley, Elements of Statistics, 
p. 319. 
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The above illustration suggests the question, "Will a 
linear relationship between X and Y always give perfect 
correlation? " 

Assume the linear relationship 
Y = aX+b 
Since y=Y— M 2 and x=X— M 2 
M 2 +y=a(x+Mi)+b or y=ax 
(since b-aMi-M 2 =0) 

Zxy Zax 2 aZx 2 _ 

atl r_ V / Sx 2 -Sy !1 ~ VSx^SaV ~ \/a 2 (2x 2 ) 2 _± 
(The sign of r depends upon the sign of a.) 

Therefore a linear relationship between two variables will 
give a correlation coefficient of +1 or —1 depending upon 
whether large values of one occur with large values of the 
other or large values of one occur with small values of the 
other. 

The converse of the above proposition is likewise true, i. e., 
if the coefficient of correlation (r) equals 1 then the relationship 
between the X and Y series is linear. 

Assume r = l 

then (2xy) 2 -2x 2 -2y 2 = 
Letting xi = A 1 y 1 , x 2 =A.jy 2 . . . x n =<* n y n the above expression 
becomes 

yi 2 y 2 2 0i-^) 2 +yi 2 y3 2 (^i-^) 2 + . . . +yrV&— »■)*+ • • -=0 
The only way in which this expression can equal zero 
is by having 

A 1 = / 2 = /<3= . . . =/„ 

and it follows that 

xi=<*iyi, x 2 =^iy 2 . . . x n = ^y n 
or 

x = ^iy 
which denotes a linear relationship between X and Y. 

That any relation other than a linear one will not lead to 
r— 1 is illustrated by the following: 

Let Y=X 2 

X = l, 2, 3, 4, 5, M 1= 3 

Y=l, 4, 9, 16, 25, M 2 = ll 
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X 


y 


x2 


y 2 


xy 




-2 


-10 


4 


100 


20 




-1 


- 7 


1 


49 


7 


^ = 1.41 





- 2 





4 





* 2 =8.65 


+1 


+ 5 


1 


25 


5 


r= 0.981 


+2 


+14 


4 


196 


28 




Tota 





., 10 


374 


60 





Although the two series increase regularly, so that devia- 
tions of like signs always correspond, yet the correlation is not 
perfect because a linear relation does not exist between X and Y. 

If the number of items in each series be increased to 11 and 
the Y items remain squares of the X's the value of r will be 
0.974. 

If there be no law connecting the X and Y series the products 
of the deviations (xy) are as apt to be negative as positive. The 
expression Sxy will therefore tend to approach zero. With a 
very large number of measurements the correlation coefficient 
will approximate zero. 

From the condition of no relationship to the condition of a 
linear relationship existing between the pair of series of meas- 
urements the correlation coefficient varies from to ±1. 

Suppose that we are investigating the relation existing 
between two series of measurements X and Y. Let points be 
plotted on cross-section paper whose coordinates are corres- 
ponding measurements X» and Yi. If there be a relationship 
existing between the two series, the points thus located will 
not lie chaotically all over the plane, but they will range them- 
selves about some curve or locus. This curve, which has been 
called the curve of regression, is illustrated in the accompanying 
diagram (p. 302).* The straight line best fitting the points is 
called the line of regression, t 

♦Distributions similar to that of the diagram may be found in Natural Selection in 
"Helix Arbicstorum," A. P. di Cesuola, Biometrika, Vol. V, Part IV, p. 392, and Anthrop- 
ometry of Scottish Insane, J. F. Tocher, Biometrika, Vol. V, Part III. 

tSo named by Francis Galton from the fact that if any group of men be picked out for an 
exceptional characteristic (say height) any other characteristic of those men correlated 
with height will regress toward the normal, or in other words the second characteristic will 
be less abnormal than the first. 
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For example suppose we consider the two series of index 
numbers for the period 1879-1904 inclusive, representing (1) 
money in circulation in the United States inclusive of bank 
reserves, and (2) bank reserves.* Let points be located with 
abscissas proportionate to the money in circulation and 
with ordinates proportionate to the bank reserves of the same 
year. The chart on the next page shows that these points 
lie near a straight line, the line of regression. 

The coefficient of correlation (r) is a measure of the closeness 
of the grouping of the points about this line of regression. If the 
points should all range themselves on a line then r would equal 
+1 or —1 depending upon whether, looking left to right, the 
line sloped upward or downward. 

We will now derive the equation of the line of regression. 
Let X and Y be associated measurements and x and y be 
associated deviations from the respective arithmetic means. 
A linear relation between the measurements is of the form 

Y=a,X+bi 
The relation between the deviations will be of form 
y=aix or y— aix=0 

Since all of the points are not located exactly upon a straight 
line the substitution of the values xi yi, x 2 y 2 , etc. in the equa- 
tions will give residues vi, v 2 , etc. as follows: 
yi— aixi=vi 
y 2 — aix 2 =v 2 
y„— ai x„=v» 

The values . , „ , . , „ , .... , " „ equal 
VI +a! 2 ' Vl+a! 2 ' Vl+ai 2 H 

the distances of the various points to the straight line y = aix. 
The equation of a line such that the sum of the squares of 
the distances from the given points is a minimum will now 
be found. In other words that value of ai will be taken which 
makes Vi 2 +v 2 2 + .... v„ 2 = a minimum. To find the 
value of ai, for which (yi— aiXi) 2 +(y 2 — aix 2 ) 2 + .... 
+ (y„— aix„) 2 will be a minimum, differentiate with respect to 
ai and obtain— 2xi(yi—aiX])—2x 2 (y 2 —aix 2 ) .... — 2x n 
(y n — aix n ). In order that the original function be a mini- 
mum, this derivative must equal zero. We will then have 

*Kemmerer, Money and Prices, p. 141. 
2 
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Bank Reserves. 





Money in circulation inclusive of / 


250 . 


bank reserves plotted as abscissas, and / f 




Bank Reserves as ordioates. / 




(Columns I and 11 on p. 141 of /. 




■MoDey & Price", KeMerer, years / 




1879-1904 inclusive. ) / 




/% 


200. 


• / 




• / 




• / 




• / 




• / 




• / 


150 


/ • 




/• 




• / 




/• 




/% 




%/ 




/ LINE OP REGR2SS10S 


100. 


/ y = 1.59s - 46.3 




/ * # 




/ • 




* / 




/ Money in Circulation. 




±J- > , . 1 ►, — 1 — *, «-. 
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(xiyi-aixi 2 ) + (x 2 y 2 - j aix 2 2 ) .... + (x„y„-aix„ 2 ) =0, or 
2xy-ai2x 2 = 

2xy 



ai = 



2x 2 



Similarly if x = a 2 y, then 2xy — a 2 2y 2 =0 will give the value 
of a 2 for which the sum of the squares of the distances of the 
given points to the straight line X = a 2 Y+b 2 is a minimum, or 

2xy 



a 2 



2y 2 

Let r= , = . = -^L and 2x 2 =n-r 1 2 2y 2 =n* 2 2 . 

\/2x 2 -2y 2 nw y 

The equations between the deviations are: 

y = r — x 

x = r— y 

It may seem that the two equations just given are incon- 
sistent. But it must be remembered that these equations do 
not give the relationship existing between any corresponding 
pair of deviations unless all of the points lie exactly on a 
straight line and there be perfect correlation. For all cases of 
imperfect correlation a given deviation x will occur with several 
different deviations y (if we have a large number of measure- 
ments). If these deviations y are distributed according to the 
normal law of distribution then the given value x substituted 
in the first equation will give the mean of the deviations occur- 
ing with the deviation x and if a given value y be substituted 
in the second equation the value of x resulting will be the 
mean of the deviations of the associated characteristics. 

Since y = Y-M 2 andx = X-M x 

Y = M 2 +r^-(X-M!) 
"i 

and X = M 1 +r— (Y-M,) 

The coefficients r— and r— are called the coefficients of 
regression of Y upon X and of X upon Y respectively. The 
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first coefficient (r — ) and the reciprocal of the second (— ) 
«i r«v 

are the slopes of the lines of regression. If X and Y be meas- 
ured in terms of their respective standard deviations as units 

the slopes of the lines of regression will be r and -. In other 

words, the slope of the line of regression of Y upon X, each series 
being measured in terms of its standard deviation, is equal to 
the coefficient of correlation for the two series. For perfect posi- 
tive correlation the line would make an angle of 45° with the X 
axis for perfect negative correlation the line would make an angle 
of 185° with the x axis, and for no correlation the line would 
be parallel to the x axis. 

In the preceding we have fitted a straight line to the points. 
In some cases the points may arrange themselves along some 
locus such as could be represented by y = a+bx+cx 2 + 

. or y = a+bc" or any other empirical function. The 
work in fitting such a curve to the points would be great and 
the advantages are not sufficient to compensate for the addi- 
tional work.* 

The following table gives the correlation coefficients for the 
various pairs of series of statistics for which Dr. Kemmerer has 
attempted to show correlation by means of charts: 



Statistics of 



Period Covered. 



Coefficient of Corre- 
lation 
Probable Error. 



J Money in circulation inclusive of bank reserves \ 
\ Bank Reserves J 

/ Business distrust 1 

\ Ratio of bank reserves to check circulation. ... J 

I Business distrust 

I Ratio of bank reserve to check circulation 

/ Relative circulation 1 

\ General Prices J 



1879-1904 



1879-1904 



1879-1903 
1880-1904 



1879-1901 



+0.98±.006 
+0.53=* .095 
+0.72* .064 
+0.23*0.13 



*Karl Pearson in an article "On the General Theory of Skew Correlation and Non- 
linear Regression" (published by Dulau and Co. in 1905 as one of the Drapers' Company 
Research Memoirs) discusses the cases in which the law connecting the two series of sta- 
tistics is not linear. He says, "In the great bulk of biometrical and economical enquiries, 
however, the regression does not diverge very markedly from the linear form. In the cases 
of non-linear regression that I have hitherto had to deal with, I find that parabolae of the 
second or third order will suffice as a rule to describe the deviations from linearity" (p. 21). 
Equations of parabolae of the second and third order are of the form 

y=a +a 1 x+a 2 x2 
and y=a +a 1 x+a 2 x2-|-a3X 3 
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The correlation coefficients show that there is a very great 
difference in the degree of correlation of different pairs of series 
of statistics. The full significance of the "probable error," 
which is used as a measure of unreliability of any determina- 
tion, cannot be developed at this point.* It is sufficient to 
note that, "When r is not greater than its probable error we 
have no evidence that there is any correlation, for the ob- 
served phenomena might easily arise from totally unconnected 
causes; but, when r is greater than, say, six times its probable 
error, we may be practically certain that the phenomena are 
not independent of each other, for the chance that the observed 
results would be obtained from unconnected causes is prac- 
tically zero."t 

The high degree of correlation (+0.98) between money in 
circulation inclusive of bank reserves and bank reserves is 
due to the tendency of the two items to vary together during 
the long time period and not due to correspondence of minor 
fluctuations. The reasons for the great increase of money in 
circulation in the United States during the period 1879-1904 
are the great increase of population and the industrial expan- 
sion. Likewise the number of banks increased in order to serve 
the increased population and this meant an increase of total 
reserves. It is self-evident that the long time tendency of the 
two series of statistics must be upward in a growing country. 
It seemed to me that the bank reserves during the 26 years, 
1879-1904, would be as closely correlated with the population 
as with total circulation. The computation of the correlation 
coefficient between bank reserves and population gave +0.98. 
It is the variation upwards of both series during the entire 
period that causes the high coefficient. 

The correlation coefficient between the index numbers of 
business distrust and the rates of bank reserves to check circu- 
lation for the same years is 0.53. When the index numbers of 
business distrust for one year are correlated with the ratio 
of bank reserves to check circulation the following year the 
coefficient is 0.72. As Dr. Kemmerer has suggested (but 

*In the computation of the formulas for the probable error it is assumed that errors are 
distributed "normally." 
tBowley, Elements of Statistics, p. 320. 
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not verified), there is a closer correlation "when proper allow- 
ance is made for the time required for alterations in business 
confidence to exert their influence on bank reserves."* The 
lowest correlation (+0.23), that between relative circulation 
and general prices, is not high enough to warrant a conclu- 
sion that the items vary together. The smallness of the cor- 
relation indicated may have resulted either because the quan- 
tity theory is in error or beeause the statistics are not adequate 
to test the theory. Whatever may be the fact, the statistics 
and the method of measuring correlation presented by Dr. 
Kemmerer do not demonstrate that general prices move in 
sympathy with relative circulation. 

Is the contention of Mr. Cross that "the percentage of suc- 
cessful strikes decreases during periods of business prosperity 
and increases during 'hard times' " supported by the sta- 
tistics? The following table gives the correlation coefficients 
between pairs of series of statistics, one of the pair in each 
case being the per cent, of establishments in which strikes suc- 
ceeded, and the other series being taken as indicative of busi- 
ness conditions: 



Statistics of 



Period 



Coefficient of 

Correlation 

fc Probable Error 



fPer cent, of establishments in which strikes'! 

J succeeded 1 

i Index numbers of wholesale prices from Aldrich [ 
I Report and United States Labor Bureaut J 

f Per cent, of successful strikes 1 

\ Prices / 

fPer cent, of successful strikes (year ending") 

J Dec. 31) I 

) Per capita foreign trade (year ending June f 
I 30) J 

I Per cent, of successful strikes 1 

\ Index numbers of business distrust! / 



1881-1905 



1881-1904 
1882-1905 



1881-1905 



1881-1904 
1881-1904 



-0.146±0.132 



-0.086*0.134 



-0.178±0.130 



-0.076±0.134 



The amount of correlation indicated in each case is small — 
considering the number of years taken, so small that no con- 
clusion as to the connection between the two series can be 

*Money and Prices, p. 146. 

f Reduced to a continuous series. 

JKemmerer's figures from Money and Prices, p. 141. 



23] The Correlation of Economic Statistics. 309 

drawn. The correlation coefficient in the last instance, i. e., 
between per cent, of successful strikes and business distrust, 
suggests an opposite conclusion to that indicated by the other 
coefficients and that of Mr. Cross. The analysis shows that the 
conclusion that there is negative correlation between general 
prosperity and per cent, of successful strikes is not warranted. 

Finally, what is the degree of correlation between the prices 
of British Consols and Sauerbeck's index numbers of the prices 
of commodities? The chart on p. 297 indicates a greater degree 
of correlation (negative) between the minor fluctuations of the 
two series than shown by any of the pairs of series that we have 
considered. The coefficient of correlation based upon statis- 
tics for the 57 years from 1851 to 1907, inclusive, is — 0.58± 
0.06. A correlation coefficient of — 0.58 based upon 57 pairs of 
items warrants the conclusion that the two series have inverse 
movements. 

The relations between the average deviations, x and y, of the 
two series of statistics being considered are:* 

y= -1.465x and x= -0.2295x 
The equations of regression are: 

Y = 225.6 -1.465 X and X = 19.439-0.2295 Y 

For certain pairs of time-series (corresponding items occur 
at same time) of measurements a correlation coefficient ap- 
proximating zero may be obtained even though graphs of the 
statistics show that the up-and-down fluctuations occur 
together. This result will come about if the long-time varia- 
tions show opposite tendencies, as, for instance, in the statis- 
tics of marriages and bank clearings in the United Kingdom. 
On the other hand, a high correlation coefficient may be ob- 
tained for two series having the same long-time tendency 
regardless of the non-correspondence of the short-time fluctua- 
tions. For example, the coefficient for the two series, popula- 
tion and bank reserves, came out to be 0.98. This high coeffi- 

*If a value Xi be substituted for the deviation x in the equation y=r __£_ x we ought to 

get an approximation to the average value of the deviations of the Y character, all of which 
characters are associated with the deviation %i of the X character. Likewise if y 1 be sub- 
stituted in x=r - y we ought to get an approximation to the average of the associated x 

deviations. The closeness of the approximation will increase as the form of distribution of 
one series of characters associated with a given value of the other character (called an array) 
approaches the normal curve of error. 
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cient comes from the fact that the long-time variation of both 
series is the same. Consequently, before it is legitimate to 
draw any conclusions as to the meaning of a lack of correlation, 
or amount of correlation between two series of measurements 
it is necessary to ascertain the periodic and the secular varia- 
tions in the two series. This correlation coefficient mav be 
large through the correspondence of either secular or periodic 
variation, or both. It may be null because one variation covers 
up the other. 

Three methods have been used for isolating the short-time 
variations of time-series of measurements. They will now be 
considered. 

1 . If upon plotting the two series being compared with time 
as abscissa and the measurements as ordinates, periodic varia- 
tions appear at approximately equal intervals of time the curve 
may be "smoothed" and the secular variations may be 
eliminated as follows :* 

(a) Ascertain the length of the wave by finding the number 
of time units between corresponding parts of the waves, i. e., 
crest to crest, or hollow to hollow. Let 1 represent the number 
of time units found. 

(b) Average groups of 1 consecutive measurements, placing 
the points, determined by these averages at the middle of each 
group of measurements. Take enough groups so that the 
points obtained will indicate the general tendency of the 
series. 

(c) Draw a smooth curve through the points located by the 
process described in (b). This curve shows the secular ten- 
dency. 

(d) Subtract (this can be done graphically on cross-section 
paper) the ordinates of the "smoothed" curve from those of 
the original curve in order to obtain the series of measurements 
of the periodic fluctuation. Let d stand for any one of these 
differences. 

(e) The coefficients computed for corresponding ordinates 
of two smoothed curves, and for corresponding differences, d 
and d', give measures of the secular and periodic correlation, 
respectively. 

*Bowley, Elements of Statistics, pp. 176, et seq. 
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The method described above has been applied by Mr. R. H. 
Hooker in his paper "On the Correlation of the Marriage-Rate 
with Trade,"* and by Mr. G. U. Yule in his study of " Changes 
in Marriage and Birth-Rates in England and Wales during the 
Past Half Century."! The following table gives the correla- 
tion coefficients computed in the articles named for the 
periodic variations: 



Series 


Period 


Deviations 
from 


Coefficient 

of 
Correlation 




1861-1895 
1876-1895 
1865-1896 

1870-1895 


9 yr. means 
9 yr. means 
11 yr. means 

11 yr. means 




\ Amount of bank clearings per capita J 


+0.86 
+0.47 


\ Sauerbeck's index numbers of prices. J 


+0.795 


•1 Hartley's index numbers of unem- [ 


-0.873 



The effect of using the deviations rather than the original 
series in computing the coefficient is shown by the comparison 
of the first correlation coefficient of +0.86, given above, with 
the correlation coefficient of +0.18, obtained for the same two 
series of original measurements for the same period, 1861-1895. 

Using the deviation-method, Mr. Yule computed the cor- 
relation coefficients between first, the marriage rate of one year 
(m), and second, exports (e), imports (i), total trade (t), the 
price of wheat (w), and bank clearings (c) for the same year, 
and for each of several preceding years in order to answer the 
question, "does the maximum amount of correlation occur 
when corresponding items are of same year or when the mar- 
riage rate of one year is paired with the business item for a 
preceding year?" 

*Journal of the Royal Statistical Society, September, 1901, Vol. 60. 
Wbid., Vol. 69, pp. 88-132. 
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The following table gives the maximum values found: 





For period 1861-1895 


For period 1876-1895 


Series 
Correlated 


Business item pre- 
cedes marriage- 
rate item by 


Value of 
Coefficient 


Business item pre- 
cedes marriage- 
rate item by 


Value of 
Coefficient 


rme 


H year 


+0.86 


J4 year 


+0.90 


rmi 


H year 


+0.82 


H year 


+0.86 


rmt 


}4 year 


+0.91 


\i year 


+0.92 


rmw 






year 


+0.56 


rmc 






1 year 


+0.92 



Having found that the trade cycle affects marriage rate Mr. 
Yule asks the question, "Does the cycle affect the birth-rate, 
and how?* The relation between marriage-rate and birth-rate 
is shown by the following table : 



Series 


Period 


Deviations 
from 


(a) Correlated 
with (b) of 


Correlation Coeffi- 
cient ±P. E. 


(a) Marriage-rate \ 

(b) Birth-rate / 


1850-1896 


11 year means 


1 yr. following 

2 yrs. following 

3 yrs. following 


0.352 ±0.086 
0.479±0.076 
0.418 ±0.082 



Mr. Yule says, "Fitting a parabola to the three values thus 
determined, a maximum correlation of about 0.482 must sub- 
sist between the birth-rate and the marriage-rate of 2.17 (two 
years and two months) previously."! 

Further analysis leads Mr. Yule to the conclusion that 
birth-rate is independently (not through marriage-rate only) 
sensitive to short-time economic changes and that the birth- 
rate is lowered after a depression, not only because of a de- 
crease in the number of marriages during such depression, but 
also to a decrease in fertility. 

2. In case the statistics show a long-time tendency with no 
regular periodic fluctuation Mr. R. H. Hooker has suggested 
that the "differences between successive values of the two 



*IUd., p. 123. 
tl&w*., p. 123. 
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variables, instead of the differences from the arithmetic 
means"* be correlated. Put into mathematical symbols we 
have: 

( X , Xi .... X„ ^ 
Letting < > represent two series of 

( X'„, X'i .... X'„ ) measurements, 

( di, d 2 .... d„ ~) represent differences be- 
and < >• tween any two consecu- 

( d'i, d' 2 . . . . d' n ) tive measurements, 

fd m > 
and •< > represent the respective means of these differ- 
( d' m ) ences, 

then d m = — - ° = — , and 

n n 

d , = xv-x^, = 2dr. 

n n ' 

and the standard deviations of the differences are 



2(d-d m ) 2 






n 
and the coefficient of correlation is 

2(d-d m )(d'-d' m ) Sdd / -nd m d / m 



nB'S v/(Sd 2 -nd m 2 )(Sd' 2 -nd' m 2 ) 

Comparing this method of differences with the method 
described in (1) Mr. Hooker says, "Correlation of the devia- 
tions from an instantaneous average (or trend) may be 
adopted to test the similarity of more or less marked periodic 
influences, correlation of the difference between successive 
values will probably prove most useful where the similarity 
of the shorter rapid changes (with no apparent periodicity) 
are the subject of investigation, or where the normal level of 
one or both series of observations does not remain constant."! 
He finds that the ordinary correlation coefficient (r) for the 
price of corn in Iowa and total production in the United States 
for the period 1870-1899 is -0.28, while p = -0.84. 

*Ibid., Vol. 68, p. 697. 
Ubid., p. 703. 
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I have computed p for the statistics of corn production 
in the United States and the average farm price on December 
1* for the period 1866-1906 and finds p = -0.833±0.034. 
Letting x represent the production difference in millions of 
bushels, and y represent the price difference in cents per 
bushel, the equations of regression are 

y=-0.0256x+1.132 
x=-27.05y+46.42 

A graphic representation of the points whose abscissas and 
ordinates are the corresponding production and price differ- 
ences, respectively, and the line of regression is given on page 
315. The lack of correlation between the original pair of series 
is shown by the chart on page 316. 

From the equations of regression such statements as the 
following can be made:f 

( i) For no change in corn production there is an increase in 
price of 1.132 cents per bushel. 

(ii) For an increase in production of 100 million bushels the 
price decreases 1.43 cents per bushel. 

(iii) For a decrease in production of 100 million bushels the 
price increases 3.69 cents per bushel. 

(iv) For a stationary price the production must increase 
46 million bushels per year. 

It seemed to me that if percentage changes in price and 
production were used instead of absolute changes a still 
closer correlation might result. The computation of p from such 
percentages, however, gave —0.794. 

♦Statistical Abstract of the United States, 1906, p. 543. 

tThe writer is making similar computations for wheat, oats, rye, barley, buckwheat, pig- 
iron, wool, bituminous coal and anthracite coal. It seems to me that the determination of 
the correlation between money and prices might be carried out by this method. After 
having allowed for the influence of changes in the supply of the various articles on those 
articles the influence of the change in the supply of money upon all the articles might be 
determined. 
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In the preceding illustrations the amount of correlation 
between the differences was greater than that between the 
original series. The method of differences has also been used 
by the writer for Kemmerer's statistics (considered on page 
15 of this article) of (1) money in circulation, and (2) bank 
reserves for the period 1879-1904 with the result p = +0.392, 
whereas the value of r is 0.98. This shows that there is a lack 
of correspondence of the short-time variations in these two 
series. 

3. A third method of eliminating the long-time tendency and 
thus isolating the short-time fluctuations is to assume some 
curve, represented by an algebraic equation, which "fits" 
the statistics in question. The first step in the process is to 
select some curve, which, for a priori or other reasons is con- 
sidered the best representation of the " growth element. "* The 
second step is to fit the curve to the statistics; stated algebra- 
ically, to determine the constants in the equation of the curve 
by use of the actual data.f Finally the deviations of the 
original measurements from the smooth curve (called by 
Norton "the growth axis") are computed. The accuracy 
with which one law, the geometric, y = be* describes the pop- 
ulation of the United States, and consequently many things 
that depend upon population is shown by the following dia- 
gram. The full points are fixed by the actual population ac- 
cording to each of the censuses from 1850. The smooth line is 
the graph of the equation 

y=24,086,000 (1.0238) x , 

which equation was determined from the actual population. 

*J.P. Norton gives a table of interpolation forms from Steinhauser on p. 25 of the New 
York Money Market. 

fThe method of fitting curves to statistics constitutes a separate subject. An extensive 
use of mathematics is necessary in order to develop this subject fully. 
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y » 24,086,000 (1.0238) 

Dot3 denote population according to 
the census. 



1850 



1860 



1870 



1880 



1890 



1900 
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Prof. J. P. Norton has applied the method here described to 
determine the correlation existing between percentage of 
reserves to deposits of New York Associated Banks and call 
rates.* Weekly statistics were taken for the period 1885-1900. 
The growth axes assumed were the geometric curve, y = bc x , 
and the straight line y = a, respectively, (y = the measure- 
ment, x = time measured in weeks, while a, b and c are con- 
stants to be determined from the data.) The typical periodic 
fluctuations of percentage deviations of reserves and loans 
were also correlated by this method, using y = bc x as the growth 
function in both cases. The following table gives the 
correlation coefficients, p f 



Series 


p 




— 37±0.02 




+0.49=1=0 07 




+0 62=1=0.06 




+0.87±0.02 




+0.96±0.01 




+0.91=1=0 05 







The conclusion from this study is that "the loan period 
is really the shadow of the reserve period" . . . and 
apparently follows the latter by "an interval of approximately 
three weeks." J 

Up to this point the problem before us has been the measure- 
ment of the amount of correlation between two variables. 
This is the simplest case of the general problem of the measure- 
ment of the amount of correlation between one series of meas- 
urements, and a group of any number of series of measure- 
ments. The solution of the general problem leads to very 
complex relations, § and it will not be taken up here. The 
case of three variables will be considered briefly. 

*J. P. Norton, Statistical Studies in the New York Money Market. 
Hbid., p. 96. 
tlbid., p. 94. 

§Yule, G. U„ on the Theory o Correlation, Journal of the Royal Statistical Society 
Vol. 60, p. 835. 
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Messrs. R. H. Hooker and G. U. Yule have considered the 
problem, To find the relation between the production of wheat 
in India during the period 1890-1904 (years ending March 
31), the price of wheat (calendar years), and the exports of the 
subsequent twelve months, 1891-1905 (years ending March 
31). The correlation of the annual differences according to 
the method described in (2) of page 312 gives the following 
results: 



Series Correlated 


Coefficient Correlation 




+0.77 




+0.86 


3. Production and Price combined in the ratio 1:1, and Exports. . . 

4. Production and Price combined in the ratio 3: 1, and Exports.. . 


+0.90 
+0.81 
+0.58 







The table indicates that exports depend upon production 
and price, and depend equally upon them. 

Messrs. Hooker and Yule give the following general solution 
of the special problem just considered: 

To find the maximum correlation coefficient between xi 
and x 2 +bx 3 that results from considering b a variable, where 
xi, x 2 , and x 3 are the deviations of the series Xi, X 2 , and Xs 
from their respective arithmetic averages. 

Let x 2 +bx 3 =z 

then S(x x z) = Sxix 2 +Sbxix 3 =n(ri20'i<' 2 +bri30'iO'3) 
and 2z 2 =n(tf 2 2 +b 2 o-3 2 +2br 2 3<r 2 fl'3) 

rijj^+brtt^ 
Hence Vw-^ w+2hwi 

To find the value of b for which this is a maximum, differ- 
entiate with respect to b and equate to zero; then 
(ri3— riaT2a) ff 2 



(ri 2 — ri3 , r 2 3)«'3 
which gives the maximum value 



, . /ri^+ris 2 — 2ri 2 r 23 r3i 

Vx x z= y — 



l-r 23 2 

Computing V x x z from the data of Indian production, price, 
and exports of wheat the value 0.905 is obtained. 
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Mr. G. U. Yule, in the paper already referred to,* has 
worked out the general solution of the problem of the correla- 
tion between three variables. In the course of the solution the 
problem just considered is solved incidentally. The argument 
is similar to that used in the case of two variables and so it 
will not be repeated here. A concrete notion of the results se- 
cured by Mr. Yule can be obtained from the following 
explanation taken from Mr. Hooker's article on the "Correla- 
tion of the Weather and the Crops."f 

"I have in the first place formed the ordinary coefficient 

r = 77 between the crop and (a) rainfall, (b) accumulated 

Vnci^ 
temperature above 42°. But rainfall and temperature are 
themselves correlated; hence an apparent influence of, say, 
rainfall upon a crop may really be due to rainfall conditions 
being dependent upon temperature, or vice versa. Hence it 
seemed desirable to calculate the partial or net correlation co- 
efficients, i. e. (following the notation given in Mr. Yule's 
paper of 1897). 

rj2— ri2"r 2 3 ri 3 — ri2'r 2 3 

Pl2= s/JT^^Ki^^y' Pu= VCl^r^Kl-ris 2 ) 
"This partial coefficient (p) may be regarded as a truer in- 
dication of the connection between the crop and each factor 
alone, inasmuch as, speaking approximately, we may say that 
the effect of the other factor is eliminated. It may be ob- 
served, moreover, that the relative influence of rainfall and 

Pl2 

temperature upon the crop is given by — ; or, more accurately, 

Pl3 

this fraction measures the relative effect of changes equal in 
amount to their respective standard deviations in the rainfall 
and temperature. In discussing the figures in the tables I 
shall accordingly utilize the partial correlation coefficients 
rather than the others. Finally, I have worked out what Mr. 
Yule calls the coefficient of double correlation between the 
crop and ra infall and accumulat ed temperature above 42°, 

■o t / ri2 2 +ri3 ii -2ri2T23Ti3 

R = V (T=iS) ' 

or as it m ay also be written, 

R^Vl-a-r^Xl-Pu 2 ), 
a form which is quicker to calculate. This may be regarded 

♦Note on Estimating the Relative Influence of Two Variables upon a Third, Journal of 
the Royal Statistical Society, Vol. 69, pp. 197-200. 
t Journal of the Royal Statistical Society, Vol. 70, pp. 5 and 6. 
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as a measure of the joint influence of the rainfall and the tem- 
perature upon the crop. For the sake of brevity, I shall 
speak of R as measuring the effect of the 'weather,' using this 
term in the strictly limited sense of consisting only of these 
two factors. . . . 

"I propose to regard a coefficient between 0.3 and 0.5 as 
suggestive of dependence. Values below 0.3 I shall, as a rule, 
ignore, in the absence of any corroborative evidence. Perhaps 
I may remark that I believe that some statisticians would 
consider themselves justified in drawing deductions from 
lower coefficients than those I have adopted as my limits."* 

Mr. Yule notes that the partial or net correlation coefficient 
retains three of the chief properties of the ordinary coefficients: 
" (1) it can only be zero if both net regressions are zero; (2) it 
is a symmetrical function of the variables (i. e., Pi2 = Pii); 
(3) it cannot be greater than unity, "f 

The various illustrations which have been cited show the 
importance of questions of correlation in economics. The 
ordinary graphic method of measuring correlation is inade- 
quate. The coefficient of correlation is simple and yet is 
sensitive to small changes. It has been used in many fields 
of statistics by Galton, Pearson, Yule, Hooker, Elderton and 
others. The experience of these writers warrants the adop- 
tion of the coefficient of correlation by economists as one of 
their standard averages. 

"Journal of the Royal Statistical Society, Vol. 70, p. 5. 
mid., Vol. 60, p. 833. 



